124 research outputs found

    Adaptive work placement for query processing on heterogeneous computing resources

    Get PDF
    The hardware landscape is currently changing from homogeneous multi-core systems towards heterogeneous systems with many di↵erent computing units, each with their own characteristics. This trend is a great opportunity for database systems to increase the overall performance if the heterogeneous resources can be utilized eciently. To achieve this, the main challenge is to place the right work on the right computing unit. Current approaches tackling this placement for query processing assume that data cardinalities of intermediate results can be correctly estimated. However, this assumption does not hold for complex queries. To overcome this problem, we propose an adaptive placement approach being independent of cardinality estimation of intermediate results. Our approach is incorporated in a novel adaptive placement sequence. Additionally, we implement our approach as an extensible virtualization layer, to demonstrate the broad applicability with multiple database systems. In our evaluation, we clearly show that our approach significantly improves OLAP query processing on heterogeneous hardware, while being adaptive enough to react to changing cardinalities of intermediate query results

    Visual Decision Support for Ensemble Clustering

    Get PDF
    The continuing growth of data leads to major challenges for data clustering in scientific data management. Clustering algorithms must handle high data volumes/dimensionality, while users need assistance during their analyses. Ensemble clustering provides robust, high-quality results and eases the algorithm selection and parameterization. Drawbacks of available concepts are the lack of facilities for result adjustment and the missing support for result interpretation. To tackle these issues, we have already published an extended algorithm for ensemble clustering that uses soft clusterings. In this paper, we propose a novel visualization, tightly coupled to this algorithm, that provides assistance for result adjustments and allows the interpretation of clusterings for data sets of arbitrary size

    Penalized Graph Partitioning based Allocation Strategy for Database-as-a-Service Systems

    Get PDF
    Databases as a service (DBaaS) transfer the advantages of cloud computing to data management systems, which is important for the big data era. The allocation in a DBaaS system, i.e., the mapping from databases to nodes of the infrastructure, influences performance, utilization, and cost-effectiveness of the system. Modeling databases and the underlying infrastructure as weighted graphs and using graph partitioning and mapping algorithms yields an allocation strategy. However, graph partitioning assumes that individual vertex weights add up (linearly) to partition weights. In reality, performance does usually not scale linearly with the amount of work due to contention on the hardware, on operating system resources, or on DBMS components. To overcome this issue, we propose an allocation strategy based on penalized graph partitioning in this paper. We show how existing algorithms can be modified for graphs with non-linear partition weights, i.e., vertex weights that do not sum up linearly to partition weights. We experimentally evaluate our allocation strategy in a DBaaS system with 1,000 databases on 32 nodes

    Standing Processes in Service-Oriented Environments

    Get PDF
    Current realization techniques for service-oriented architectures (SOA) and business process management (BPM) cannot be efficiently applied to any kind of application scenario. For example, an important requirement in the finance sector is the continuous evaluation of stock prices to automatically trigger business processes--e.g. the buying or selling of stocks--with regard to several strategies. In this paper, we address the continuous evaluation of message streams within BPM to establish a common environment for stream-based message processing and traditional business processes. In detail, we propose the notion of standing processes as (i) a process-centric concept for the interpretation of message streams, and (ii) a trigger element for subsequent business processes. The demonstration system focuses on the execution of standing processes and the smooth interaction with the traditional business process environment

    Multi-flow Optimization via Horizontal Message Queue Partitioning

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as near real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, as well as high requirements for up-to-dateness of analytical query results and data consistency, many instances of integration flows are executed over time. Due to this high load, the performance of the central integration platform is crucial for an IT infrastructure. With the aim of throughput maximization, we propose the concept of multi-flow optimization (MFO). In this approach, messages are collected during a waiting time and executed in batches to optimize sequences of plan instances of a single integration flow. We introduce a horizontal (value-based) partitioning approach for message batch creation and show how to compute the optimal waiting time. This approach significantly reduces the total execution time of a message sequence and hence, it maximizes the throughput, while accepting moderate latency time

    Limitations of Intra-operator Parallelism Using Heterogeneous Computing Resources

    Get PDF
    The hardware landscape is changing from homogeneous multi-core systems towards wildly heterogeneous systems combining different computing units, like CPUs and GPUs. To utilize these heterogeneous environments, database query execution has to adapt to cope with different architectures and computing behaviors. In this paper, we investigate the simple idea of partitioning an operator’s input data and processing all data partitions in parallel, one partition per computing unit. For heterogeneous systems, data has to be partitioned according to the performance of the computing units. We define a way to calculate the partition sizes, analyze the parallel execution exemplarily for two database operators, and present limitations that could hinder significant performance improvements. The findings in this paper can help system developers to assess the possibilities and limitations of intra-operator parallelism in heterogeneous environments, leading to more informed decisions if this approach is beneficial for a given workload and hardware environment

    An XML-Based Streaming Concept for Business Process Execution

    Get PDF
    Service-oriented environments are central backbone of todays enterprise workflows. These workflow includes traditional process types like travel booking or order processing as well as data-intensive integration processes like operational business intelligence and data analytics. For the latter process types, current execution semantics and concepts do not scale very well in terms of performance and resource consumption. In this paper, we present a concept for data streaming in business processes that is inspired by the typical execution semantics in data management environments. Therefore, we present a conceptual process and execution model that leverages the idea of stream-based service invocation for a scalable and efficient process execution. In selected results of the evaluation we show, that it outperforms the execution model of current process engines

    GignoMDA

    Get PDF
    Database Systems are often used as persistent layer for applications. This implies that database schemas are generated out of transient programming class descriptions. The basic idea of the MDA approach generalizes this principle by providing a framework to generate applications (and database schemas) for different programming platforms. Within our GignoMDA project [3]--which is subject of this demo proposal--we have extended classic concepts for code generation. That means, our approach provides a single point of truth describing all aspects of database applications (e.g. database schema, project documentation,...) with great potential for cross-layer optimization. These new cross-layer optimization hints are a novel way for the challenging global optimization issue of multi-tier database applications. The demo at VLDB comprises an in-depth explanation of our concepts and the prototypical implementation by directly demonstrating the modeling and the automatic generation of database applications

    A Benchmark Framework for Data Compression Techniques

    Get PDF
    Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naĂŻve benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naĂŻve approach
    • …
    corecore